Introduction

The aim of this project is to be able to predict a player’s win percentage based on their individual performance.

Loading Libraries and Data

library(tidymodels)
library(tidyverse)
library(ISLR)
library(ISLR2)
library(discrim)
library(poissonreg)
library(corrr)
library(klaR)
library(ggplot2)
library(ggthemes)
library(pROC)
library(janitor)
library(corrplot)
library(rpart.plot)
library(vip)
library(randomForest)
library(xgboost)
library(yardstick)
tidymodels_prefer()

# These are all the libraries I used to make this project happen!

What is Valorant?

Valorant is a free-to-play, first-person, tactical shooter published by Riot Games. Although officially released in June 2020, the development of the game started in 2014 and did not start their closed beta access until April 2020. Valorant is set in the near future, where players play as one of 20 “agents” – characters based on several countries and cultures around the world. In the main game mode, players are assigned to a team of 5, either attacking or defending, with each team aiming to be the first to win 13 rounds. Not only is there a halftime to switch sides, but there is an overtime if the teams end up in a tie. Agents have unique abilities, each requiring charges, as well as a unique ultimate ability that requires charging through kills, deaths, orbs, or objectives.

Why is this project relevant?

As this game is still relatively young, it can easily be seen that this game is also rapidly changing. Due to factors such as new maps, new agents, or updates that can buff or nerf certain aspects of this game, the meta of Valorant is ever-changing. Moreover, as Valorant is establishing its name as the next up and coming e-sport, professional players are dissecting every bit of the game to get the edge over their opponents. Through this project, we can do a deep dive of which factors lead to a higher win percentage and what trends are currently helping players dominate the game.

Project Timeline

With all of that settled, let’s cover exactly how we’re going to accomplish this. First comes some data clean up and manipulation. Secondly, we’ll get into some exploratory data analysis – seeing the distribution of our outcome variable “win_percent” and the relationships among the many predictor variables. Next, we’ll follow up with our models, specifically linear regression, decision tree, random forest, and boosted tree. After fitting these models to the training set, we will fit the best performing model to the testing data.

Now that we all have a basic understanding of the game, let’s get started!

Data Cleaning

Setting up the data

Before we can even modify the data set, we have to read it in! Although the data was already pretty organized and neat, let’s use the clean_names function to ensure it.

valData <- read.csv('Data/val_stats.csv')

valorant <- valData %>% clean_names() # lower cases all variables, replaces spaces with underscores, etc.

Let’s remove the predictor variables “region”, “name”, and “tag” as they are just characteristics to distinguish the players and not so relevant with predicting win percentage.

valorant <- valorant %>% 
  select(-region, -name, -tag)

We’ll remove the extremities, as they were invalid entries due to either incorrect input or containing a missing value in general.

valorant <- valorant %>% 
  filter(win_percent != 0, 
           win_percent != 100,
         rating == 'Unrated' | rating == 'Bronze 2' | rating == 'Bronze 3' |
         rating == 'Silver 1' | rating == 'Silver 2' | rating == 'Silver 3' |
         rating == 'Gold 1' | rating == 'Gold 2' | rating == 'Gold 3' |
         rating == 'Platinum 1' | rating == 'Platinum 2' | rating == 'Platinum 3' |
         rating == 'Diamond 1' | rating == 'Diamond 2' | rating == 'Diamond 3' |
         rating == 'Immortal 1' | rating == 'Immortal 2' | rating == 'Immortal 3' | 
           rating == 'Radiant')

Here we turn all the nominal variables into factors so that they can be applied to our models.

valorant$agent_1 <- factor(valorant$agent_1)
valorant$agent_2 <- factor(valorant$agent_2)
valorant$agent_3 <- factor(valorant$agent_3)

valorant$gun1_name <- factor(valorant$gun1_name)
valorant$gun2_name <- factor(valorant$gun2_name)
valorant$gun3_name <- factor(valorant$gun3_name)

Same thing for the “rating” variable, and let’s organize the ranks to match how it’s organized in Valorant.

valorant$rating <- factor(valorant$rating,
                          levels = c('Unrated', 'Bronze 2', 'Bronze 3',
                                     'Silver 1', 'Silver 2', 'Silver 3',
                                     'Gold 1', 'Gold 2', 'Gold 3',
                                     'Platinum 1', 'Platinum 2', 'Platinum 3',
                                     'Diamond 1', 'Diamond 2', 'Diamond 3',
                                     'Immortal 1', 'Immortal 2', 'Immortal 3', 
                                     'Radiant'))

These variables were read in as character variables due to the comma in the number, so let’s set them to their appropriate class.

valorant$gun1_kills <- as.numeric(gsub(',', '', valorant$gun1_kills))
valorant$gun2_kills <- as.numeric(gsub(',', '', valorant$gun2_kills))
valorant$first_bloods <- as.numeric(gsub(',', '', valorant$first_bloods))
valorant$kills <- as.numeric(gsub(',', '', valorant$kills))
valorant$deaths <- as.numeric(gsub(',', '', valorant$deaths))
valorant$assists <- as.numeric(gsub(',', '', valorant$assists))
valorant$headshots <- as.numeric(gsub(',', '', valorant$headshots))

That does it for the data clean up! Let’s set the size to lessen the run-time and set the seed to get consistent results!

set.seed(1)

valorant <- sample_n(valorant, size = 10000)

Exploratory Data Analysis

Viewing the Data

To start of the EDA, we’ll take a quick look at the data to see if we missed anything in the data clean up.

head(valorant)
##       rating damage_round headshots headshot_percent aces clutches flawless
## 1 Immortal 1        124.1        51             14.3    0       18        2
## 2 Immortal 3        151.1      1309             29.5    3      183       84
## 3 Immortal 1        129.8        92             22.5    0       17       10
## 4 Immortal 1        120.5       471             20.6    0       95       42
## 5 Immortal 1        142.1       865             20.2    2      150       87
## 6 Immortal 1        123.8       643             41.5    0       89       39
##   first_bloods kills deaths assists kd_ratio kills_round most_kills score_round
## 1           20   109    140      51     0.78         0.6         17       187.6
## 2          255  1712   1690     421     1.01         0.8         29       228.3
## 3           19   151    176      51     0.86         0.7         20       193.7
## 4           65   747    875     406     0.85         0.6         26       181.2
## 5          193  1396   1324     635     1.05         0.7         29       217.6
## 6           84   730    853     266     0.86         0.6         29       185.3
##   wins win_percent agent_1   agent_2   agent_3 gun1_name gun1_head gun1_body
## 1    4        44.4    Jett   Chamber   Killjoy    Vandal        25        71
## 2   51        51.0   Reyna      Raze      Skye    Vandal        41        55
## 3    4        40.0    Sova Brimstone     Viper    Vandal        29        65
## 4   33        58.9   Viper     Astra      Sova    Vandal        30        66
## 5   55        61.1   Reyna      Sage Brimstone   Phantom        31        64
## 6   25        46.3   Astra     Reyna      Sova    Vandal        66        32
##   gun1_legs gun1_kills gun2_name gun2_head gun2_body gun2_legs gun2_kills
## 1         4         45   Phantom        14        83         3         14
## 2         3       1099   Phantom        40        57         3        170
## 3         6         95   Phantom        23        74         3         12
## 4         5        341   Phantom        32        62         6        171
## 5         6        506    Vandal        28        69         4        295
## 6         1        366   Phantom        48        50         2        151
##   gun3_name gun3_head gun3_body gun3_legs gun3_kills
## 1     Ghost        23        61        16         11
## 2     Ghost        41        57         2        142
## 3   Spectre        19        69        11         11
## 4   Spectre        20        76         5         89
## 5   Spectre        21        72         7        172
## 6   Spectre        42        53         5         58

Everything looks good here!

dim(valorant)
## [1] 10000    35

Being that we started with 38 total variables and then removed “region”, “name”, and “tag”, we correctly have 35 variables left. We can see that we have the correct amount of observations specified in our sample_n function! Let’s see how well the numerical variables correlate with each other…

Correlation plot

valorant %>% 
  select(where(is.numeric)) %>% 
  cor() %>% 
  corrplot(type = 'full', diag = FALSE, method = 'square', order = 'AOE',
           tl.col = 'orange', col = COL2('PuOr'))

As we can see, majority of the variables actually go hand-in-hand with one another, with the exception of some negative correlations like gun1_head and gun1_body for example – but that’s to be expected. To no surprise, kills are highly correlated with variables like “clutches”, “flawless”, and “deaths”; but it came as a shock to me to see that there was actually a negative correlation with variables such as “gun2_head” and “win_percent”! Speaking of “win_percent”, let’s inspect the distribution for it – being that it’s the variable we’re predicting.

Histogram

valorant %>% 
  ggplot(aes(win_percent)) +
  geom_histogram(bins = 50)

It can be observed that our response variable, “win_percent”, has a normal distribution – with majority of the players’ win percentages being around 50%. We have a small amount of players winning a very high amount of their games this act, sitting at about the 90% area. Alternatively, we have some players that are not doing so hot, sitting under the 25% area. Maybe it has something to do with the agents they are selecting? Let’s take a look.

Bar Plots

valorant %>% 
  ggplot(aes(y = agent_1)) +
  geom_bar()

As of this act, we can see that the leading and most trending agents are Chamber, Reyna, and Jett. Having personal experience playing this game, I can tell you that this is a very accurate depiction of the current state of the game. These agents have lots of unique abilities and traits that help their team win, especially if played effectively and correctly. But agents don’t define the game entirely… how about the weapons that the players are choosing?

valorant %>% 
  ggplot(aes(y = gun1_name)) +
  geom_bar()

With no surprise, the two most popular guns are the Vandal and Phantom. It is still very up in the air about which gun is objectively better, but I would assume the Vandal gets a lot more usage due to the fact that it can hit a one-shot headshot and a phantom cannot. The up-side to the phantom however, is that it allows the user to be able to control the spray of bullets, where as the Vandal is more effective with burst shots.

valorant %>% 
  ggplot(aes(y = gun2_name)) +
  geom_bar()

For the peoples’ second choice, we see that the Vandal and Phantom are used as a back-up pick (maybe their luck isn’t working with their first choice), but we also see that the Spectre, Operator, and Ghost are popular options. The reason that these guns may be the second choice for a lot of people can simply be due to economical reasons. The Operator, being the most expensive weapon in the game, does not allow the players to be able to buy it very often. With the Spectre, it is a very popular choice in terms of being a cheap but effective weapon – so if a team has to save up a little more money to be able to buy a Vandal or Phantom the next round, they will opt for the Spectre. As for the Ghost, it is a common weapon to hold during the pistol round – the first round of every game and the first round after every half.

valorant %>% 
  ggplot(aes(y = gun3_name)) +
  geom_bar()

Lastly, we look at the peoples’ third choice. We can infer that the large variation of choices can be the result of a great amount of factors, all the way from economical reasons (similar to the second choice) to subjective preference, where people just find it fun to use (after all it is a video game). Commonly picked as the third weapon, we see the Spectre (probably having similar reason as when it was the second choice), Phantom and Operator, and the three most popular pistols: Sheriff, Classic, and Ghost. With such a large range of options for a third choice, let’s look at how the players perform after choosing such weapons…

Box and Whisker Plots

valorant %>% 
  ggplot(aes(x = score_round, y = gun3_name)) +
  geom_boxplot()

Notably in this graph, we can see that the average score per round across all the levels of guns sits around the 220 area. Unexpectedly to me, although not picked quite nearly as often, the average score per round for people who chose either the Bulldog or Ares have a higher average than those who chose the Spectre. This could simply be the reason because there are not a lot of observations for these guns and since the Spectre has a higher pick rate, the average is brought down. So we’ve seen how dangerous the players can be, but just how precise is their shot?

valorant %>% 
  ggplot(aes(x = headshot_percent, y = rating)) +
  geom_boxplot()

From this graph, we can see that the higher ranks stay quite close to each other at around a headshot percent of 25. Naturally, as we climb the ranks, this percentage will increase, but actually it is striking to me to see that Immortal 3 players have a higher average than Radiant players (the top rank of the game – top 500 in the region). Although this is the case, it can also be seen that Radiant players are more consistent, since their Q1 and Q3 are closer to the mean than that of the Immortal 3 players. Overall, these players are hitting about one headshot per five shots total.

valorant %>% 
  ggplot(aes(x = damage_round, y = gun1_name)) +
  geom_boxplot()

Let’s try to settle the debate ourselves, which gun is better… the Vandal or the Phantom? As we can see in this box and whisker plot, Vandal users are dealing a higher average amount of damage per round than Phantom users. Similar to our Radiant and Immortal 3 comparison in the last box and whisker plot, Phantom users are more consistent while Vandal users cover a large range of damage per round. So I’ll ask you then… would you rather have a more consistent player or one that can either pop-off or perform underwhelmingly?

Model Building

We have finally reached model building! As mentioned previously, I will be making a linear regression model, a decision tree, a random forest, and a boosted tree. First, we start off by splitting the data. I chose to do an 80/20 split between the training set and the testing set, and then also stratified it to the variable we are predicting: win_percent.

Data Split

After splitting we can recognize that the training data correctly contains xxx observations (80% of observations in our sample size) and the testing data correctly contains xxx observations (20% of observations in our sample size).

valSplit <- initial_split(valorant, prop = 0.8, strata = win_percent)
valTrain <- training(valSplit)
valTest <- testing(valSplit)

Folding the Data

Next I implement the k-fold cross-validation method by using 5 folds and stratifying it to our predicted variable.

valFold <- vfold_cv(valTrain, strata = win_percent, v = 5)

Creating the Recipe

Following up the folds, I create the recipe that we are going to be adding to the future models.

valRecipe <- recipe(win_percent ~ rating + damage_round + headshots + headshot_percent + 
                      aces + clutches + flawless + first_bloods + 
                      kills + deaths + assists + kd_ratio + 
                      kills_round + most_kills + score_round + 
                      agent_1 + agent_2 + agent_3 + 
                      gun1_name + gun1_head + gun1_body + gun1_legs + gun1_kills + 
                      gun2_name + gun2_head + gun2_body + gun2_legs + gun2_kills + 
                      gun3_name + gun3_head + gun3_body + gun3_legs + gun3_kills, data = valTrain) %>%  
                                                                                # predict win_percent with all variables
  step_dummy(all_nominal_predictors()) %>%                                      # dummy code nominal variables
  step_other(agent_1, agent_2, agent_3, gun1_name, gun2_name, gun3_name) %>%    # pool infrequently occurring values into an "other" category
  step_novel(agent_1, agent_2, agent_3, gun1_name, gun2_name, gun3_name) %>%    # assign a previously unseen factor level to a new value
  step_center(all_numeric_predictors()) %>%                                     # center all numeric predictors
  step_scale(all_numeric_predictors()) %>%                                      # scale all numeric predictors
  step_pca(headshots, most_kills, kd_ratio,
         gun1_head, gun1_body, gun1_legs,
         gun2_head, gun2_body, gun2_legs,
         gun3_head, gun3_body, gun3_legs,
         gun1_kills, gun2_kills, gun3_kills,
         kills, assists, deaths, 
         headshot_percent,
         flawless, clutches, aces, first_bloods,
         score_round, damage_round, kills_round,
         num_comp = 3) %>%                                                      # convert numeric data into one or more principal components
  step_nzv(all_predictors())                                                    # remove variables that are highly sparse and unbalanced
valRecipe2 <- recipe(win_percent ~ rating + damage_round + headshots + headshot_percent + 
                      aces + clutches + flawless + first_bloods + 
                      kills + deaths + assists + kd_ratio + 
                      kills_round + most_kills + score_round + 
                      agent_1 + agent_2 + agent_3 + 
                      gun1_name + gun1_head + gun1_body + gun1_legs + gun1_kills + 
                      gun2_name + gun2_head + gun2_body + gun2_legs + gun2_kills + 
                      gun3_name + gun3_head + gun3_body + gun3_legs + gun3_kills, data = valTrain) %>%  
                                                                                # predict win_percent with all variables
  step_dummy(all_nominal_predictors()) %>%                                      # dummy code nominal variables
  step_other(agent_1, agent_2, agent_3, gun1_name, gun2_name, gun3_name) %>%    # pool infrequently occurring values into an "other" category
  step_novel(agent_1, agent_2, agent_3, gun1_name, gun2_name, gun3_name) %>%    # assign a previously unseen factor level to a new value
  step_center(all_predictors()) %>%                                             # center all numeric predictors
  step_scale(all_predictors()) %>%                                              # scale all numeric predictors
  step_pca(headshots, most_kills, kd_ratio,
         gun1_head, gun1_body, gun1_legs,
         gun2_head, gun2_body, gun2_legs,
         gun3_head, gun3_body, gun3_legs,
         gun1_kills, gun2_kills, gun3_kills,
         kills, assists, deaths, 
         headshot_percent,
         flawless, clutches, aces, first_bloods,
         score_round, damage_round, kills_round,
         num_comp = 3) %>%                                                      # convert numeric data into one or more principal components
  step_nzv(all_predictors())                                                    # remove variables that are highly sparse and unbalanced

Model Fits

Let’s get started with the first model…

Linear Regression

lm_spec <- linear_reg() %>% 
  set_mode('regression') %>% 
  set_engine('lm')
# set the mode to regression and the engine to linear model

lm_fit <- fit(lm_spec, win_percent ~ rating + damage_round + headshots + headshot_percent + 
                      aces + clutches + flawless + first_bloods + 
                      kills + deaths + assists + kd_ratio + 
                      kills_round + most_kills + score_round + 
                      agent_1 + agent_2 + agent_3 + 
                      gun1_name + gun1_head + gun1_body + gun1_legs + gun1_kills + 
                      gun2_name + gun2_head + gun2_body + gun2_legs + gun2_kills + 
                      gun3_name + gun3_head + gun3_body + gun3_legs + gun3_kills,
                    data = valTrain)
# nothing to complicated about linear regression, fit the specified model, add a formula, and fit to the training data
lm_results <- augment(lm_fit, new_data = valTrain) %>% 
  rmse(truth = win_percent, estimate = .pred)

lm_results
## # A tibble: 1 × 3
##   .metric .estimator .estimate
##   <chr>   <chr>          <dbl>
## 1 rmse    standard        7.56
# How did it do?

The linear regression model acquired an RMSE of 7.26781, not bad but could be better.

Now for the decision tree!

Decision Tree

tree_spec <- decision_tree() %>%
  set_engine("rpart")
# Using the rpart engine to help make the decision tree

reg_tree_spec <- tree_spec %>%
  set_mode("regression")
# Specify the tree as regression

reg_tree_fit <- fit(reg_tree_spec, 
                    win_percent ~ rating + damage_round + headshots + headshot_percent + 
                      aces + clutches + flawless + first_bloods + 
                      kills + deaths + assists + kd_ratio + 
                      kills_round + most_kills + score_round + 
                      agent_1 + agent_2 + agent_3 + 
                      gun1_name + gun1_head + gun1_body + gun1_legs + gun1_kills + 
                      gun2_name + gun2_head + gun2_body + gun2_legs + gun2_kills + 
                      gun3_name + gun3_head + gun3_body + gun3_legs + gun3_kills,
                    data = valTrain)
# Fit the model to the training data set

reg_tree_fit %>% 
  extract_fit_engine() %>% 
  rpart.plot()
## Warning: Cannot retrieve the data used to build the model (so cannot determine roundint and is.binary for the variables).
## To silence this warning:
##     Call rpart.plot with roundint=FALSE,
##     or rebuild the rpart model with model=TRUE.

# Plot the tree!
reg_tree_wf <- workflow() %>%
  add_model(reg_tree_spec %>% set_args(cost_complexity = tune())) %>%
  add_formula(win_percent ~ rating + damage_round + headshots + headshot_percent + 
                      aces + clutches + flawless + first_bloods + 
                      kills + deaths + assists + kd_ratio + 
                      kills_round + most_kills + score_round + 
                      agent_1 + agent_2 + agent_3 + 
                      gun1_name + gun1_head + gun1_body + gun1_legs + gun1_kills + 
                      gun2_name + gun2_head + gun2_body + gun2_legs + gun2_kills + 
                      gun3_name + gun3_head + gun3_body + gun3_legs + gun3_kills)
# Create a workflow for the decision tree

param_grid <- grid_regular(cost_complexity(range = c(-4, -1)), levels = 10)

tune_res <- tune_grid(reg_tree_wf, resamples = valFold, grid = param_grid, metrics = metric_set(rmse, rsq)) # Trying to find the best value for the parameters
## ! Fold1: internal: A correlation computation is required, but `estimate` is constant and ha...
## ! Fold2: internal: A correlation computation is required, but `estimate` is constant and ha...
## ! Fold3: internal: A correlation computation is required, but `estimate` is constant and ha...
## ! Fold4: internal: A correlation computation is required, but `estimate` is constant and ha...
## ! Fold5: internal: A correlation computation is required, but `estimate` is constant and ha...
autoplot(tune_res)

It can be seen that as we increase the cost-complexity parameter, the RSQ will eventually peak, but the RMSE is at its lowest point.

best_complexity <- select_best(tune_res, metric = "rmse") # Selecting the best value for RMSE

reg_tree_final <- finalize_workflow(reg_tree_wf, best_complexity) # Creating a workflow using the best parameter value

reg_tree_final_fit <- fit(reg_tree_final, data = valTrain) # Fitting the model onto the training data set

reg_tree_final_fit %>%
  extract_fit_engine() %>%
  rpart.plot()
## Warning: Cannot retrieve the data used to build the model (so cannot determine roundint and is.binary for the variables).
## To silence this warning:
##     Call rpart.plot with roundint=FALSE,
##     or rebuild the rpart model with model=TRUE.

# Plot the tree again!
reg_tree_result <-augment(reg_tree_final_fit, new_data = valTest) %>%
  rmse(truth = win_percent, estimate = .pred) # Let's check how well our model did

reg_tree_result
## # A tibble: 1 × 3
##   .metric .estimator .estimate
##   <chr>   <chr>          <dbl>
## 1 rmse    standard        8.11

Our model got an RMSE of 9.295308 – this model does not seem that great.

Moving onto Random Forest!

Random Forest

rf_spec <- rand_forest(mtry = .cols()) %>%
  set_engine("randomForest", importance = TRUE) %>%
  set_mode("regression")
# Creating a Random Forest model

rf_fit <- fit(rf_spec, win_percent ~ rating + damage_round + headshots + headshot_percent + 
                      aces + clutches + flawless + first_bloods + 
                      kills + deaths + assists + kd_ratio + 
                      kills_round + most_kills + score_round + 
                      agent_1 + agent_2 + agent_3 + 
                      gun1_name + gun1_head + gun1_body + gun1_legs + gun1_kills + 
                      gun2_name + gun2_head + gun2_body + gun2_legs + gun2_kills + 
                      gun3_name + gun3_head + gun3_body + gun3_legs + gun3_kills, 
              data = valTrain)
# Fit the model to the training data set
rf_result <- augment(rf_fit, new_data = valTest) %>%
  rmse(truth = win_percent, estimate = .pred)

rf_result
## # A tibble: 1 × 3
##   .metric .estimator .estimate
##   <chr>   <chr>          <dbl>
## 1 rmse    standard        7.65
augment(rf_fit, new_data = valTest) %>%
  ggplot(aes(win_percent, .pred)) +
  geom_abline() +
  geom_point(alpha = 0.5)

# Let's see how well the model is able to predict the win_percent

By taking a look at the graph, it’s obvious that this model did not do a very good job at predicting the true value of win_percent.

vip(rf_fit) # Which predictors were most important at helping predict the true value?

In general, we can see that predictors such as “kd_ratio” and “rating” were quite important in helping our model predict the win percentage of a player. On the other hand, variables such as “kills” or “damage_round” don’t necessarily mean a player is going to have a high win percentage.

Last but not least… Boosted Tree!

Boosted Tree

boost_spec <- boost_tree(trees = 5000, tree_depth = 4) %>%
  set_engine("xgboost") %>%
  set_mode("regression")

boost_fit <- fit(boost_spec, win_percent ~ ., data = valTrain)

boost_result <-augment(boost_fit, new_data = valTest) %>%
  rmse(truth = win_percent, estimate = .pred)

boost_result
## # A tibble: 1 × 3
##   .metric .estimator .estimate
##   <chr>   <chr>          <dbl>
## 1 rmse    standard        2.76

Let’s put the results together to get an easier look of which model did the best shall we?

bind_rows(lm_results, reg_tree_result, rf_result, boost_result)
## # A tibble: 4 × 3
##   .metric .estimator .estimate
##   <chr>   <chr>          <dbl>
## 1 rmse    standard        7.56
## 2 rmse    standard        8.11
## 3 rmse    standard        7.65
## 4 rmse    standard        2.76

Conclusion

In conclusion, after our detailed analysis of Valorant, the model that performed the best at predicting the win percentage of a player given their individual statistics is Boosted Tree!

There are a lot of ways in which I could have improved upon in this project, such as building more models (Support Vector Machines, ridge regression, neural networks, etc.) or simply manipulating the variables a little bit more. As I worked through this project, a couple of my models failed and I ultimately was not able to pull results from it. Even more so, the models that were able run performed very poorly. As of right now, it is hard to determine what is the cause for the poor performance, but I think I can confidently say that this project did not have the results I wished for. Taking this project as a learning lesson, I hope to work on this project in the future and I hope to eventually get it right.

Overall, this project felt very rewarding to work on every bit of the way. It was a good chance to work on my skills and build some experience with machine learning and projects in general – and I got to work on a topic I am heavily interested in! Thank you for following along and I hope you enjoyed going through this project with me!

In the wise words of my random teammate, “gg go next.”